fix(webapp): use composite keyset cursor for run pagination#3852
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR fixes a correctness bug in keyset pagination for ClickHouseRunsRepository where cursor predicates diverged from the query's composite (created_at, run_id) ordering. The fix introduces a new v2 cursor format encoding the composite key, updates ClickHouse query results to include created_at_ms, refactors repository methods to apply matching composite predicates, and adds backwards compatibility for legacy run_id-only cursors. BulkActionService is updated to consume the new cursor pagination API, and comprehensive integration tests verify forward/backward pagination consistency and legacy cursor handling. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
44f147e to
9974909
Compare
6b3c823 to
07a3c8e
Compare
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/plugins
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
c6c3b4e to
5ee6389
Compare
listRunIds/listRuns order by the composite key (created_at, run_id) but
the cursor predicate cut on run_id alone. That is only sound when run_id
lexicographic order matches created_at order. When a burst of runs is
created such that the two diverge, keyset pagination both re-includes
already-returned runs (duplicates) and drops runs it should return
(skips). For bulk replay this produced duplicate runs; for the dashboard
and runs.list it could silently skip or repeat runs at page boundaries.
- Cursors now encode the composite (created_at, run_id) key as an opaque
URL-safe base64 token and cut on the matching tuple predicate. The
ORDER BY is unchanged, so the table's primary-key alignment (and query
performance) is preserved.
- Cursors are server-issued opaque tokens (the SDK echoes
pagination.next/previous back), so this needs no client update. Legacy
bare-run_id cursors decode to the old run_id-only predicate for
backwards compatibility with in-flight cursors.
- listRunIds is the single cursor-aware list primitive, returning
{ runIds, pagination: { nextCursor, previousCursor } }; listRuns and
bulk actions build on it. Bulk actions finish when nextCursor is null.
- getTaskRunsQueryBuilder selects created_at; the pending-version lookup
keeps its run_id-only schema.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5ee6389 to
7116971
Compare
## Problem Two backward-pagination bugs in `ClickHouseRunsRepository.listRunIds`, both pre-existing (they predate the composite-cursor work in #3852 and were spotted during/after it): **1. Wrong slice (straddled pages).** `listRunRows` fetches `page.size + 1` rows to detect `hasMore`. That extra row is the one *farthest from the cursor* in both directions (forward orders DESC; backward orders ASC), so it's always the *trailing* element. Forward correctly used `rows.slice(0, size)`, but backward+`hasMore` used `rows.slice(1, size + 1)` — dropping the row *closest* to the cursor and keeping the has-more sentinel. The page straddled two logical pages (one run from the correct previous page + one from the page before it), so paging "newer" across a boundary **repeated and skipped** runs. **2. Stranded forward cursor on a partial backward page.** In the backward `!hasMore` branch, `nextCursor` was `reversedRows.at(page.size - 1)`. On a partial page (fewer than `page.size` rows — reachable via `runs.list` by passing a forward page's cursor as `page[before]`), that index overshoots → `undefined` → `nextCursor` becomes `null`, leaving no way to page forward again. ## Fix - **Slice:** both directions now slice `rows.slice(0, size)` (the sentinel is the trailing element either way). - **Partial-page cursor:** the backward `!hasMore` branch takes the oldest row on the page, `rows.at(0)`, for `nextCursor` — equivalent to the old expression for full pages, correct for partial ones. Forward pagination, the cursor *values* for full pages, and the `hasMore === true` paths were already correct and are unchanged. ## Tests `runsRepositoryCursor.test.ts` gains two cases (both fail on `main`, pass here): - **multi-page backward walk:** forward across all pages, then backward from the last page — each backward page must *exactly* reproduce the corresponding forward page (no straddling: `main` returns `{b,c}` instead of `{c,d}`), and the full traversal covers every run once. - **partial backward page:** backward onto a partial first page must still expose a working forward cursor (and paging forward from it reaches the rest) — `main` returns a `null` nextCursor. The three existing cursor tests (forward completeness, backward round-trip, legacy cursor) still pass. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
ClickHouseRunsRepository.listRunIds/listRunsorder results by the composite key(created_at, run_id), but the cursor predicate cut onrun_idalone:This is only sound when
run_idlexicographic order matchescreated_atorder.run_ids are cuids — only coarsely time-sortable — so when a burst of runs is created within a sub-second window, the two orders can diverge. When they do, the next-page predicate (run_id < cursor, wherecursoris the last page element = the smallestcreated_at, not necessarily the smallestrun_id):For bulk replay this caused runs to be replayed more than once (replay has no idempotency guard). For the dashboard and the
runs.listAPI it could silently repeat or skip runs at page boundaries.Fix
Make the cursor predicate match the composite ordering:
(created_at, run_id)key as an opaque URL-safe base64 token (base64url({"c":<createdAtMs>,"r":"<runId>"})), and the query cuts on the matching tuple —(created_at, run_id) < (…)forward /> (…)backward.ORDER BYis unchanged, so the query stays aligned with the table's primary key — no performance regression (the tuple range predicate is actually more index-friendly thanrun_id <alone).pagination.next/pagination.previousback), so this needs no client/SDK update. Legacy cursors were the bare internalrun_id; they're detected by decode failure (a cuid isn't a valid base64-wrapped JSON payload) and fall back to the oldrun_id-only predicate, so in-flight cursors keep working and drain naturally. New cursors also no longer expose a bare internal run id.listRunIdsis now the single cursor-aware list primitive: it returns{ runIds, pagination: { nextCursor, previousCursor } }, andlistRunsbuilds on it (one place constructs cursors). Bulk actions consume the same method and advance bypagination.nextCursor, finishing when it'snull.getTaskRunsQueryBuildernow also selectstoUnixTimestamp64Milli(created_at) AS created_at_ms, using a dedicatedTaskRunListQueryResultschema. The sharedTaskRunV2QueryResultstaysrun_id-only so the run-engine pending-version lookup (getPendingVersionIdsQueryBuilder, which selects onlyrun_id) doesn't fail validation on a column it doesn't query.Tests
New
runsRepositoryCursor.test.ts(testcontainer-backed, real Postgres→ClickHouse replication):run_idorder is the reverse ofcreated_atorder (reproduces the duplicate/skip bug — fails onmain; this walk-until-nextCursor-null-and-assert-complete is exactly the bulk action's iteration),run_idcursor still uses the old predicate (backwards compatibility).The existing
runsRepositorysuites (part1–4) still pass;part4'scount new runs with listRunIdstest was updated for the new{ runIds, pagination }return shape, and theclickhousetaskRunsquery-builder snapshots were regenerated for the addedcreated_at_mscolumn.Notes
listRuns' backward display-slicing (rows.slice(1, size+1)whenhasMore) has an off-by-one that can return a straddled page. Tracked separately.🤖 Generated with Claude Code